A Low-Depth Monotone Function that is not an 
Approximate Junta 



Daniel M. Kane 
March 8, 2013 



1 Introduction 

In [2], O'Donnell and Servedio show that any monotone function given by 
a depth-<i decision tree can be learned to constant accuracy from random 
samples in polynomial time. The impact of this result is somewhat lessened 
by an apparent lack of interesting monotone functions given by low-depth 
decision trees. In particular, it has been conjectured that all such functions 
essentially depend on few variables. 

Conjecture 1. For every e > and every monotone function f : {0, l} n — > 
{0, 1} given by a depth-d decision tree, there is a k-junta, g for k = poly e (d) 
so that f and g agree on all but an e- fraction of inputs. 

In this note, we disprove the above conjecture, and in particular provide 
an example of a monotone low-degree function that is not well approximated 
by any small junta. In particular we prove: 

Theorem 2. There exists a constant e > so that for every positive integer 
d, there exists a k = exp(Q,(yd)) and a monotone function f : {0,1}™ — > 
{0, 1} given by a depth-d decision tree, so that for every k-junta g, f and g 
disagree on at least an e- fraction of inputs. 

In fact it is not hard to show that the bound on k in Theorem [2] is 
tight up to the constant in the exponent. In particular, it is shown in 
[2] that any monotone function given by a depth-d decision tree has total 
influence /(/) = 0{yfd). We combine this with the main result of [1], which 
says that any boolean function / can be e-approximated by a /c-junta for 
k = exp(0(/(/)/e)). Combining these results we find that: 
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Corollary 3. If f is a monotone function given by a depth-d decision tree, 
and if e > 0, then there is a k-junta g that agrees with f on all but an e 
fraction of coordinates for k = exp(0(vd/e)). 

By this result of [I], we know that any / satisfying the conditions of 
Theorem [2] must not only have near the maximum possible total influence 
for a low-depth monotone function, but also must not be approximable by 
any function with much lower total influence. Because of this restriction, 
our construction will look somewhat similar to a construction of Talagrand 
in [3]. In particular, Talagrand constructs a monotone function / on {0, l} n 
so that on a constant fraction of inputs, / has influence Q(y/n). We note 
that since the total influence of / must be 0(y/n), that this condition is 
equivalent to saying that for any subset A C {0, l} n with \A\ = 17(2") that 
YlxeA ^H* : f( x ) ^ = ^(l^lv^)' which is a strengthening of the 

condition that / is not close to any function of small total influence. 



2 The Construction 

In order to define the function / with the properties specified by Theorem [21 
we first introduce some background notation. We let d, t and m be integers 
with t = Q(y/d) and m = 0(2*). We furthermore assume that 2~ l m is suffi- 
ciently small given the value of tj\fd. We let S = (Si, . . . , S m ) be a random 
sequence of sets, where the Si are chosen independently and uniformly from 
the set of subsets of{l,2,...,d — l}of size exactly t. Given this S, we define 
the function T$ on {0, as follows: 

Ts(x±, . . . , Xd-i) = {1 < i < m : Xj = 1 for all j G Si}. 

We will hereafter abbreviate T by suppressing the explicit dependence on 
S, and abbreviate (x\, . . . , Xd-x) by x. 
We finally define / as 



fs(xi, ■ ■ ■ ,x d -i,yi, ...,y m ) 



1 if \T(x)\ > 2 
if|T(a;)|=0 
{yi if T(x) = {i} 



Again, we will often suppress the dependence of / on S. It is clear that 
/ is monotone. Furthermore, / is given by a depth-ci decision tree, since 
after fixing the values of the Xi, the value of / depends on at most one more 
coordinate. In the next Section, we show that / cannot be approximated 
by any A;-junta for small k. 
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3 Approximation Bounds 

Theorem [2] will follow from the following Proposition: 

Proposition 4. There exists and e > so that for fs defined as above, with 
constant probability over the choice of S, f is not e- approximated by any 
k -junta for k = o(2 ). 

Before we begin the proof, we will need one Lemma 
Lemma 5. With T as above, 

Pr s , x (\T s (x)\ = 1) = 0(1). 
Proof. We will show the further claim that 

E[|r 5 (x)|(2- 1^)1)]= 0(1). (1) 

Since the term in the expectation is positive only if |T| = 1, this will complete 
our proof. We note that 

rn 

E[\T s (x)\] = J2^(^T s (x)) 

i=l 
m 

= J2 PT ( X j = 1 for a11 3 ^ S i) 

i=X 

= m2~ t . 

On the other hand, we have that 

E [|r s (x)|(|T s (x)| - 1)] = XJPrfrj G T s (x)) 

= ^Pr(i G T 5 (x))Pr(i G T s (x)\i G T s (x)) 

= 2^Pi(x e = 1 for all £ G Sj\x e = 1 for all £ G S t ). 

To compute this conditional probability we let Sj = {a±, . . . , at} where the 
a, are picked randomly from {1,2,... , d — 1} without replacement. After 
fixing the values of Si, a\, . . . , a r _i and conditioning on the event that xi = 1 
for £ £ Si and x ai = ■ ■ ■ = x ar _ 1 = 1, we compute the probability that 
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x ar = 1. This probability is clearly 1/2 if a r £ Si and 1 if a r E Si. Thus the 
probability that x ar = 1 is 

(1 + Pr(a r E Si))/2 =(l+ |^Ugkl__L^zl>l "j /2 = (1/2 + 0(t/d))- 

Hence the probability that j € T$(x) given that i 6 ^M^) is 
(1/2 + 0(i/d))* = 2-* exp(0(i 2 /d)) = 0(2~*). 
Therefore, we have that 
E[\T s (x)\(\T s (x)\ - 1)] = ^2- 2 'exp(0(t 2 /d)) < (2-*m) 2 exp(0(t 2 /d)). 

Therefore, we have that 

E [|T 5 (*)|(2 - \T s (x)\)] = E [\T s (x)\] - E [\T s (x)\(\T s (x)\ - 1)] 

= (2-*m) - (2"*m) 2 exp(0(t 2 /d)) 
= (2 _t m) (1 - (2 - *m) exp(0(t 2 /d))) . 

As long as 2~*m is bounded below by a constant and above by exp(— 0(t 2 /d))/2, 
this is 0(1). □ 

We are now ready to prove Proposition HI By Lemma [U we note that 
with constant probability over S, that Pr x (|T(x)| = 1) = 0(1). For such 
S, we claim that / has the desired property. In particular we claim the 
following: 

Lemma 6. If f is as above and g is a k- junta, then 

ntr , \ / / \ \ Pr x (\T(x)\ = l)-k2~ t 
Pr(f(x,y) / g(x,y)) > * VI V n - . 

Proof. This follows from the simple observation that after fixing the value of 
x that if T = {i} and g does not depend on yi that Pi y (f(x, y) 7^ g(x, y)) = 
1/2. This is because after further conditioning on the values of all yj for 
j ^ i, g becomes a constant function (by assumption) and / takes the values 
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and 1 each with probability 1/2. Therefore we have that 

Pr(T(x) = {i} and g does not depend on j/j) 



Pr(/(x,y) / g(x,y)) > 



2 

Pr(|T(x)| = 1) — Pr(T(x) = {i} and 5 depends on yi 

2 

Pr(|r(x)| = 1) - depends on Vj P^fr) = {z» 

2 

Pr(|T(x)| = 1) - depend, on y t ^ g ^)) 

2 

Pr(|T(x)| = 1) — X)i:g depends on ^ ^ * 

2 

Pr»(|T(x)| = l)-k2~ t 



□ 

Proposition [J] and Theorem [2] now follow immediately. 
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